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Abstract — We analyze the power allocation problem for or- 
thogonal multiple access channels by means of a non-cooperative 
potential game in which each user distributes his power over the 
channels available to him. When the channels are static, we show 
that this game possesses a unique equilibrium; moreover, if the 
network's users follow a distributed learning scheme based on 
the replicator dynamics of evolutionary game theory, then they 
converge to equilibrium exponentially fast. On the other hand, 
if the channels fluctuate stochastically over time, the associated 
game still admits a unique equilibrium, but the learning process 
is not deterministic; just the same, by employing the theory of 
stochastic approximation, we find that users still converge to 
equilibrium. 

Our theoretical analysis hinges on a novel result which is of 
independent interest: in finite-player games which admit a (possi- 
bly nonlinear) convex potential, the replicator dynamics converge 
to an e-neighborhood of an equilibrium in time O(log(l/s)). 

Index Terms — Nash equilibrium; potential games; parallel 
multiple access channel; power allocation; replicator dynamics. 



I. Introduction 

IN view of the decentralized nature of future and emerg- 
ing wireless networks, non-cooperative game theory has 
become an important tool to analyze distributed problems 
in networks whose nodes cannot be assumed to adhere to 
centrally controlled protocols. The main goal has been to 
develop policies and algorithms that nodes can use to optimize 
their resources (power, bandwidth, etc.) on their own, so, 
following [1], the questions that arise are a) whether there 
exist "equihbrial" policies which are stable against unilateral 
deviations; b) whether these (Nash) equiUbria are unique; 
and c) whether they can be reached by distributed learning 
algorithms that require only local information. 
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Accordingly, an important paradigm which has attracted 
significant interest in the wireless communications literature 
concerns the allocation of power over orthogonal communi- 
cation channels [1], [2], [3]. From a centralized viewpoint, 
this is a relatively well-studied subject, especially with respect 
to optimal power allocation schemes which allow users to 
reach the boundary of the rate region assuming full channel 
knowledge and central control [4], [5]. On the other hand, 
more recent examinations [6], [7], [8] focus on socially stable 
power allocation policies because, even if the globally optimal, 
capacity-achieving power profile is known and used, it might 
be unstable under deviations by selfish users (and thus useless 
in a decentraUzed setting). 

In this paper, we consider the problem of uplink commu- 
nication in multi-user networks consisting of several receivers 
that operate on distinct, non-interfering channels, and we focus 
on giving definitive answers to points (a)-(c) above, analyzing 
the equilibrial structure of the problem and its convergence 
aspects. Despite its apparent simplicity, this parallel multiple 
access channel (PMAC) model has several relevant applications 
such as, for instance, in 802. 11 -based wireless local area 
networks (wlans) with non-overlapping channels [9], [10], 
distributed soft handoffs in cellular systems [11], distributed 
power allocation in digital subscriber fines (dsls) [12], and, 
finally, in throughput-maximizing power control in multi- 
carrier code division multiple access (mc-CDMA) systems [13]. 

Our analysis will focus on the single-user decoding (sud) 
scheme where the transmitted signal of each user is decoded 
separately by the receiver(s) who treat the incoming signal of 
other users as additive (Gaussian) noise. The main reason for 
using SUD instead of successive interference cancellation (SIC) 
is that the former is known to have lower decoding complexity 
and signalUng overhead than the latter - a consequence of SUD 
not having to broadcast the decoding order to the transmitters 
[14]. As a result, SiC-based schemes suffer from scalability 
issues, especially when there are several receivers and/or the 
channel is highly time-varying. 

In this context, non-cooperative power allocation games for 
static Gaussian interference channels (iCs) have been studied 
in a series of related papers [7], [8]. There, the existence of a 
Nash equilibrium is a consequence of the convexity properties 
of the users' achievable rates and follows from the general 
theory of [15]. In fact, under suitable (but stringent) conditions 
on the channel matrices, it was shown that this equilibrium is 
unique and that iterative water-filling algorithms converge to 
it. 

Formally, the static pmac is a special case of this IC frame- 
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work, but the conditional analysis of [7], [8] almost always 
fails for static PMAC models. Thus, although the (global) 
capacity region of this channel is well-understood [4], [5], [2], 
the channel's distributed version remains unresolved. A first 
attempt to remedy this was carried out in [11] where it was 
shown that an associated power allocation game admits an ex- 
act potential function [16] whose minimum corresponds to the 
game's Nash equilibria. However, this potential is not strictly 
convex, so Nash equilibrium uniqueness might fail along with 
the uniqueness conditions of [7], [8]. Rather surprisingly, it 
turns out that this is not the case: even though these conditions 
do not hold in the static pmac context, the Nash equilibrium 
of the static PMAC game is unique (Theorem 1). 

In itself, uniqueness allows us to characterize the system's 
behavior at equilibrium, but it does not provide a way of 
actually getting there. Regarding such convergence issues, 
the authors of [6] considered a single channel with pricing 
and exhibited power control algorithms which converge to 
equilibrium under "mild-interference" conditions. Similarly, 
one of the main results of [12] was to show that if the transmit- 
ters know the local channel state and the overall inteference- 
plus-noise covariance matrix, then, subject to similar "mild- 
interference" conditions, iterative water-filling converges to the 
equilibrium set of the game (a result which was then enhanced 
in [17] by dropping this condition for a modified water-filling 
scheme). 

Instead of taking a water-filling approach, we present a 
simpler leaming scheme based on the replicator dynamics of 
evolutionary game theory [18] which involves the same (often 
less) information from the side of the players, and which does 
not require them to solve a nonlinear water-filling problem. 
Dynamics of this type have been studied extensively in finite 
[19] and continuous population games [20], but, in nonlinear 
games (such as the one we have here), their properties are 
not as well understood. Nonetheless, by taking a modified 
version of the users' utilities, we show that the replicator 
dynamics converge to an e-neighborhood of the game's (a.s.) 
unique equiUbrium exponentially fast, i.e. in time O(log(l/s)) 
(Theorem 3). 

In the context of fading however, the static game and 
the corresponding replicator dynamics lose much of their 
relevance because variations due to fading open the door 
to stochasticity. To account for this randomness, we study 
both fast and block-fading models. Using techniques from 
the theory of stochastic approximation [21], we show that by 
properly adjusting their learning scheme, users converge to 
the (unique) equilibrium of an averaged game whose payoff 
functions correspond to the users' achievable ergodic rates 
(Theorems 4 and 5). 

Our convergence analysis is based on a novel game-theoretic 
result which is of independent interest: in games which admit 
a (star-)convex potential function, the replicator dynamics 
converge to (the game's unique) equilibrium at an exponential 
rate (Theorem 6). To the best of our knowledge, this is the 
fastest convergence rate that has been established for the 
replicator dynamics in the current state of the art [20]. 

Notational Conventions: If is the vector space 
spanned by the set S = and {Carl^^j denotes its 



canonical basis, we will use a to refer interchangeably to 
either Sa or Ca, and we will identify the set A(§) of probability 
measures on S with the standard (5 -l)-dimensional simplex of 
R®: A(§) s {x e : 2;„x„ = 1 and Xa > 0). Finally, we will 
employ Latin indices for players {k, (,...), while reserving 
Greek ones (a,jS, . . . ) for their ("pure") strategies; and when 
summing over a e Ak, we will simply write 2* = Y^a&Af 

11. System Model 

Following [11], the basic setup of our model is as follows: 
we consider a finite set % = {l,...,K} of wireless single- 
antenna transmitters (the players of the game) who wish 
to transmit to a group of single-antenna receivers (possibly 
clustered as a single receiver). Each of these receivers operates 
on a given channel a e A = {I,..., A] (assumed to be 
orthogonal, typically in the frequency domain), and each user 
k € % may transmit over a subset c ^ of these channels 
(with Ak = card(yii:) > 2). 

In particular, if x^a ~ GJ^iO, Pka) is the transmitted message 
of user k on channel a & A]^ and denotes the respective 
channel coeflicient, then the received signal on channel a 
will be ya = haXka + Za, where Za ~ e3sr(0, cr^) denotes 
the thermal noise. Accordingly, user k & % can split his 
transmitting power among the channels a & Ak subject to the 
constraint: 

Yipka<Pk, (1) 

where pka - E[|xteP] represents the power with which the 
user transmits on channel a, and Pk is his maximum power. 
As a result, the power allocation of user k will be given by 
the point pk - XaPka^ka ^ and, analogously, the power 
profile which collects all users' power allocations will be re- 
presented by p = ip\,...,pK) G HjfeR'^' = R^ with Q = 
ZkAk. 

In this context, our performance metric will be the users' 
achievable transmission rates, which depend on their signal to 
interference-plus-noise ratio (SINR): 

sinrte(p) = , , „ , (2) 

+ Lt*kgtaP{a 

with gka = \hka\^ denoting the channel gain coefficient of user 
k in channel a e A^. Clearly, the users' achievable rates will 
depend on their power allocation pohcies through their SINR, 
but the exact dependence hinges on the time-variability of the 
channel gain coefficients gi^„. 

At one end, we will study static channels, i.e. channels 
whose coherence time is much larger than both the self- 
decodable block duration and the power updating period. At 
the other extreme, we will also consider fast-fading channels 
where the coherence time is much shorter than those charac- 
teristic times; here, what matters is the ergodic value of the 
SINR and the corresponding rate. Finally, we will also analyze 
the more interesting intermediate case, where the coherence 
time is greater than the block length but comparable to the 
update time - hence allowing blocks to be decoded using 
the instantaneous channel values in (2), but also introducing 
stochasticity in the game. 
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A. Static Channels 

We will start with the case of static channels employing the 
single-user decoding (sud). In this case, the spectral efficiency 
of user A: in the power profile p will be given by [11], [6]: 

Ukip) = Za ba log (l + Smka(p)) (3) 

where ba > represents the bandwidth of channel a e 
and the channel gains g^a are drawn once and for all from 
a continuous distribution on [0, oo) at the outset of the game 
and remain fixed for the duration of the transmission [7], [11]. 
Then, to maximize their spectral efficiency (3), users saturate 
(1) by transmitting at ffie highest possible power [11], so we 
are led to ffie static pmac game ® = ®(3C, {A^}, [uk]) where: 

1) The players of © are the transmitters X = [l,..., K). 

2) The strategy space of player k is the (scaled) simplex 
Ak = Pk HAk) = [pk G R-^' : Pte > and Za Pka = Pk] 
of power allocation vectors; the game's space of strategy 
profiles p - (p\, . . . ,px) will then be A = Ylk ^k- 

3) The players' payoffs (or utilities) are given by the spectral 
efficiencies Uk'. A — > R of (3). 

Of course, the game ® defined in this way does not adhere 
to the original normal form of Nash in the sense ffiat a) players 
are not mixing probabilities over a finite set of possible actions, 
and b) even though the players' strategy spaces are simplices, 
ffieir payoflTs are not (multi)Unear. On the other hand, with Uk 
being concave in pk, it is easy to see ffiat (5 is itself concave 
in ffie sense of Rosen [15]. Also, as was shown in [11], (5 
possesses an exact potential [16], i.e. a function <1>: A ^ R 
such that: 

uk(p-k; Pk) - Ukip-k;pk) = <^{p-k;pk) - '^(p-k; pi), (4) 

for all users k & X, and for all power allocations Pk,p[ g A^ 
of user k and p-k g A_;t = Ilf^k^ of ^'s opponents; in fact, 
ffie analysis of [11] also provides the expression: 

^iP) = - 2a log (l + 2t gkaPka/o-l) • (5) 

B. Fading Channels 

As discussed above, ffie non- static models ffiat we will 
examine are block-fading and fast-fading channels. 

1) Block-fading channels: In this case, ffie coefficients 
gka = 8ka(t) remain constant over an entire transmission 
block, so, assuming ffie transmitter knows (2) for each block 
through feedback, ffie users' utilities are still given by (3), with 
different gains gka at each self-decodable block.' As such, (5) 
is still a potential for the (now evolving) game (5(f), ffie only 
difference being that O will evolve over time following ffie 
channels and ffie game. 

2) Fast-fading channels: In this regime, the coefficients 
hka = hka{t) evolve ergodicaUy at a rate which is much 
faster ffian the characteristic lengffi of a transmission block, so 
ffie "instantaneous" utilities (3) lose ffieir relevance. Instead, 
and assuming for simplicity that users saturate ffieir power 

'Note that we do not assume any delay constraints at the receiver [22]; in 
this way, reliable information (a la Shannon) can be transmitted over each 
block. 



constraints, their utilities will be given by ffie ergodic rates of 
[23]: 

E^[log(l -Hsinrto(/?))]. (6) 

We ffius obtain the ergodic game (5 = (X, {Ak], {uk)), which 
has the same strategic structure as its static counterpart © but 
payoffs given by (6) instead of (3). 

In fact, as in ffie static case, ® admits ffie exact potential: 

0(p) = E,[<D(p)] = - ba E J log (1 + Zk gkaPka/crl)], (7) 

whose form depends on the law of the gka- Thus, with hka ~ 
6^(0, y/fka), Jka ^ 0, the coefficients gka - \hka?' will be 
-distributed, and the calculations of [24, eq. (11)] yield: 

Proposition 1. In i.i.d. Gaussian fast-fading channels with 
hka ~ C^(0, sjyka), the ergodic potential <1> is: 

where rka = JkaPka/crl and ((x) = j^°°(x-H f)" ' e^' dt^-e"" Ei(-x). 

This proposition will be crucial in the numerical calculations 
of Section V. For posterity, we only note here that (8) impUes 
that <I> is strictly convex [25], even though, in general, O is 
not. 

III. Equiubrium Analysis 
We begin with the notion of Nash equilibrium: 

Definition 1. A power profile ^ e A will be a Nash equilibrium 
of the game (5 (resp. (5) when: 

Ukiq) > Uk{q-k;q[), (resp. Tik(q) > Uk(q-k;q'k)) (9) 

for all k e X and for all q'j^ g A^. In particular, if q satisfies 
the strict version of (9) for all ^ qk, it will be called strict. 

Given that ® (resp. (5) admits a convex potential, its 
equilibrium set will coincide with the minimum set of <1) 
(resp. O) [26]. As such, the existence of an equilibrium is 
guaranteed, and ffiis is already important from a practical 
point of view because leaming protocols would never converge 
otherwise. Our goal in this section will be to show ffiat these 
equilibria are essentially unique, thus ensuring ffie system's 
predictability - a crucial feature for performance evaluation, 
QoS guarantees, etc. 

A. Static Channels 

With regards to the static potential <1), it is easy to see that 
two power profiles p,p' e A will have <!>(/?) = ^{p') whenever 

gka{p[a - Pka) = for all a e A. (10) 

In ffiat case, O will not be strictly convex, so its minimum set 
might fail to be a singleton as well. More precisely, if we set 
z - p' - p, we will also have Jifr Zka - for all k e X, so, 
on account of (10) above, <1) will not be strictly convex if ffie 
following linear system admits a non-zero solution in z: 

I,k8kaZka = 0, a€A; 2aZto = 0, k€X. (11) 

Since z g R2, Q = Xk^k, and ffie above A-i- K constraints 
are independent (a.s.), we see that if Q - A - K > 0, ffien fl) 
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cannot be strictly convex - see [27] for more details. In fact, 
the quantity ind((5) = Q-A-K will be called the degeneracy 
index of the game (5, and the condition ind((5) > means that 

if the number of links Q exceeds the number of channels plus 
transmitters A + K, then the game's potential is not strictly 
convex. 

In typical uplink scenarios of practical interest (e.g. single- 
receiver OFDM), each user can access all channels, so A^^ = A 
for all k, and, hence Q = AK > A + K (except in small 
2x2 systems). This implies that degeneracy appears almost 
always, so in the absence of strict convexity, a promising 
way to determine whether the pmac game admits a unique 
equilibrium would be to use the conditions of [7], [8], [28] 
where one constructs a certain matrix S from the channel gain 
coefficients and tries to show that said matrix has a spectral 
radius p(S) < 1. However, as is shown in [27], this spectral 
radius exceeds 1 (a.s.), so the results of [7], [8], [28] do not 
apply to our problem. Still, we have: 

Theorem 1. The static game © admits a unique Nash equi- 
librium (a.s.). 

Sketch of Proof: Let p e A and consider the (multi)graph 
Sip) = (A, t{p)) whose vertices are the network's receivers 
and whose edge set £(p) is the multiset sum t{p) = l+Jjt t^ipk) 
where each £,k{pk) is a star graph on the nodes a e Axo which 
Pk assigns positive power > (i-e. all channels to which 
k transmits with positive power are star-connected and these 
graphs are superimposed for all k e %). If p is equilibrial, 
S(p) has to be acyclic [27], so p must he in the interior of 
an at most (A - l)-dimensional face of A (a.s.); our assertion 
then follows from a dimension-counting argument (see [27] 
for details). ■ 

B. Fading Channels 

1 ) Block-fading channels: As we discussed in Section II-B, 
the time-varying version of (5) which corresponds to block- 
fading channels gka = gkait) is a potential for the block-fading 
game ©(?)• Accordingly, Theorem 1 implies: 

Corollary 1. At each channel realization, the block-fading 

game (S(f) admits a unique Nash equilibrium (a.s.). 

2) Fast-fading channels: On the other hand, the averaging 
effect in the ergodic rates (6) can be used to show that the 
ergodic potential O is, in fact, strictly convex. This gives: 

Theorem 2 ([25]). The ergodic game ® admits a unique Nash 
equilibrium. 

IV. Learning Dynamics and Convergence to Equilibrium 

Although Theorems 1 and 2 guarantee equilibrium unique- 
ness, it is not at all clear whether users will be able to calculate 
this equilibrium in decentralized environments where only 
partial/local information is available at the terminal (e.g., as in 
distributed or partially distributed cognitive radio networks). 
Consequently, our goal in this section will be to present 
a simple distributed learning scheme which allows users to 
converge to equilibrium, and to determine the speed of this 
convergence. 



This question has attracted considerable interest from the 
point of view of learning, and two of the most well-studied 
paradigms are best-response (br) algorithms and reinforce- 
ment learning (rl) [29], [19], [20]. In standard br schemes 
[20], players are assumed to monitor their opponents' power 
allocation policies and respond optimally to them (with respect 
to their individual utilities). Unfortunately (and in addition 
to the "perfect monitoring" requirement), it is quite hard to 
calculate these best responses in large games, so the applica- 
bility of this approach to large decentralized networks is quite 
limited. To circumvent these limitations (in static channels at 
least), a promising solution lies in the water-filling approach 
of [12], [7], [8], [17] where users only need their local channel 
and overall noise-plus-interference covariance matrix. In that 
case however a) users must solve a non-convex fixed point 
problem at each step; and b) convergence is conditional on 
the interference being low enough (except in [17]). In fact, 
the conditions of [7], [8] do not hold in the pmac case [27], 
while the approach of [12] breaks down for large ntmibers of 
users [30]. 

On the other hand, RL algorithms (such as regret-matching 
[31]) rely on the players knowing their (possibly fictitious) 
payoffs. Thanks to this information (which, however, is often 
hard to come by), these algorithms enjoy strong convergence 
properties in potential games. However, such learning algo- 
rithms have been designed for discrete action sets, so it is very 
hard to adapt them to games with continuous action spaces 
(such as the ones we are considering here). 

To overcome these hmitations, our starting point will be the 
replicator dynamics of evolutionary game theory [18], [19], 
[20]. The reinforcement aspect of these dynamics does suffer 
from the same drawback as most RL algorithms (i.e. it apphes 
only to finite action sets), but, by exploiting the simplicial 
structure of the game and its potential, we derive a learning 
scheme which applies to continuous action spaces and which 
allows users to converge to equiUbrium unconditionally and 
exponentially quickly (Theorems 3 and 5). 

A. Static Channels 

Since the replicator equation applies to discrete sets (such 
as Ak), bl reasonable channel- specific utihty would be: 

Ukaip) = ba log (l + sinrto(p)) (12) 
which leads to the replicator equation: 

^ = Pkait) (uMt)) -P'k'Y!fi Pk0(OukMO)) , (13) 

whose second term ensures that p(t) e A for all t > 0. Unfor- 
tunately, the utility Uk of eq. (3) is not a convex combination 
of the Uka, so (13) is not well-behaved w.r.t. the game (5 either 
- for instance, Nash equihbria are not stationary. 

Instead, given that each user invariably seeks to unilaterally 
increase his utihty, we will consider the marginal utilities: 

dUk bagka 



Vkaip) = 



(14) 



dpka o-l -i- Ze geaPta' 
Since player k can calculate Vkaip) by means of sinr^ and 
gka alone (the bandwidths ba are assumed fixed and known). 
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any dynamics based on the v^'s will be inherently distributed 
(and simpler than solving a water- filhng problem to boot). We 
will thus consider the repUcator equation: 



dp, 



'ka 



dt 



(15) 



where denotes the user average Vkip) = P^^ YlpPkpVkpip)- 
Remark 1 (Comparison to other power updating schemes). The 
replicator equation (15) is clearly quite unlike the water-filling 
schemes of [12], [7], [8], [17]. Closer in spirit to (15) are the 
algorithms developed in the 90's with the goal of minimizing 
transmitting power by comparing the instantaneous SINR to a 
target value and iteratively updating the power proportionally 
to this difference [32], [33], [34]; still, there is little overlap 
with these algorithms, both in terms of setup and convergence. 

Remark 2. The marginal utilities Vka are similar but not equal 
to the SINR (2) of user at a given channel a e Ak, and 
they do not coincide with the popular metric of total rate per 
unit power either [13], [3]. These metrics can all be calculated 
based on the same feedback (and might appear more appealing 
than Vka), but we shall see that it is precisely the (perhaps 
unconventional) choice of the marginal utilities that leads 
to convergence. 

The first important property of (15) is that its rest points 
satisfy the (waterfilling) condition v^aip) - vipip) for all nodes 
a,P e supp(/5i) to which user k transmits with positive power 
Hence, by the Karush-Kuhn-Tucker (KKT) conditions of [11], 
we see that Nash equilibria of are stationary in (15). 

The converse of this statement is not true: every vertex of 
A is stationary without necessarily being a Nash equilibrium. 
Nonetheless, the game's (unique) Nash equilibrium is the only 
attracting state of the dynamics (see Appendix A for the 
proof): 

Theorem 3. Let q e A be the (a.s.) unique equilibrium of ®. 
Then, every interior solution orbit of the replicator dynamics 
( 15) converges to q; moreover, there exists c > such that 

Dkl (^ II Pit)) <D^{q\\ pm e-" for all t>Q, (16) 

where D'kl is the Kullback-Leibler divergence. In other words, 
replicator trajectories converge to an e-neighborhood of a 
Nash equilibrium in time O(log(l/e)). 

Remark 1. To the best of our knowledge, the exponential 
convergence rate of Theorem 3 (see also Theorem 6 in App. A 
and Fig. 1) is the fastest known estimate for the replicator 
dynamics (see [20] for a review of the state of the art). 

Remark 2. The Kullback-Leibler divergence is defined as [20]: 

£>Kl(? II /?) = y„ „ fta log (qka/Pka) ■ (17) 

Clearly, Dkl(^* II Pk) is finite if and only if pk allocates positive 
power pka > to all channels a e supp(^) which are present 
in qk. Thus, in particular. Theorem 3 guarantees that uniform 
initial power allocations equilibrate exponentially quickly. 

Remark 3. In the evolutionary analysis of [35], the fitness 
of a species (the number of descendants in the unit of time) 
might be nonUnear, but it is stiU a convex combination of each 



phenotype's fitness. In this special case, the dynamics of [35] 
are formally equivalent to (15), and it is shown therein that 
their limit points are Nash equilibria. Theorem 3 (see also 
Theorem 6) extends this analysis by demonstrating that the 
replicator dynamics really do converge to Nash equihbrium, 
and that the rate of this convergence is exponential. 

Remark 4. It was shown in [7] that iterative water-filling algo- 
rithms converge exponentially when the water-filling operator 
is a contraction. However, given that the sufficient conditions 
which guarantee the contraction property fail in the pmac case 
[27], the analysis of [7] does not apply here. 

Of course, the value of the exponent of (16) is critical be- 
cause it controls how fast users converge to equilibrium. Thus, 
if we consider the "instantaneous" convergence exponents: 



)(t) = J-i D^(^k\\Pk(t)) 
t °^Z)KL(<7-tllw(0))' 



(18) 



then Theorem 3 simply states that the total equilibration rate 
Ait) = rmnk{Akit)} is at least c. For the sake of simplicity, we 
will only present here an analytic expression for the value of 
c for strict equilibria (for the full analysis including non-strict 
equiUbria, see Appendix A). In this case, if user k transmits 
with full power to channel at equilibrium, we wiU have: 

Ck = ]immf,{Akit)} = y^^il - e"^>') Avk and c = min^(ct) 

(19) 

where hvk = imn {vk,atil) - Vkpiq) : /3 ^ Uk] is k's minimum 
deviation cost and jk = £'kl(^ II piO))/Pk- We thus obtain: 

Proposition 2. If q = J^kPkSk.at « strict equilibrium of the 
static game (5, the power of user k on channel ak grows as: 



Pk,a,if)~Pk{^-e-^''''). 



(20) 



Proof: From (19), we have Ck Av* as jk 0. However, 
since DY^iqk \\ Pkit)) — > as f ^ oo (Theorem 3), we will have 
jk —^Ohy definition, and our assertion follows. ■ 

B. Fading Channels 

1 ) Block-fading channels: In this case, the replicator equa- 
tion (15) becomes non-deterministic because the coefficients 

gka evolve stochastically over time. To account for this, we 
will rewrite the replicator dynamics (15) in discrete time as: 

^Pkain+l) = 6{n)pkain) [vk„{p(n),g{n))-Vkipin),g{n))], (21) 

where Apkain + 1) = Pkain + 1) - Pkain) and the "step" 6in) is 
a (possibly time-dependent) learning parameter. 

For simplicity, we will concentrate here on the case where 
the temporal variations of the channels are uncorrelated. In this 

case, if we set Vk„ = E[vi:„] and T]k„ = Vka - Vka, we obtain: 

Pkain + 1) = Pkain) + 6in)pkain){ykaipin)) - Vkipin))") 

+ 6in)pkain)[rik„ipin),gin)) - rikipin),g{n))^, (22) 

where the Vka are deterministic and the rjka are zero-mean. In 
fact, if we interchange expectation and differentiation, we get: 



Vkaip) = E 



duk 



dpka 



d 

dpka 



E[Mi] 



duk 

dpka' 



(23) 



Convergence of the replicator dynamics in a 2x2 system 
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p] : normalized power of User 1 in channel A 

(a) Convergence of the replicator dynamics in static channels. 
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(b) Spectral efficiency over time for ditferent initial configurations. 



Fig. 1. Convergence to equilibrium in a 2 X 2 game with static channels. The dashed contours in Fig. 1(a) are the level sets of the K-L divergence w.r.t. the 
game's equilibrium (red dot), while pi,P2 represent the normalized power allocation of each user. In Fig. 1(b), we plot the spectral efficiency m of each user 
as a function of time for three randomly drawn initial configurations; as can be seen in the inlay, the equilibration rate A(t) coincides wiffi the value predicted 
by Theorem 3 (solid black line). 



SO the mean utilities of (22) are the gradients of the ergodic 
rates (6). Thus, if we remove the noise 77^0, from (22), the 
general theory of [21] shows that (22) will track the mean- 
field equation: 

dpka 



term and with a constant step 6{n) 
App. A): 



6. We thus obtain (see 



dt 



= Pka(t) [Vka(pit)) - Vi,{p{t))], 



(24) 



SO the asymptotic properties of (22) will follow those of (24). 

We thus see that the asymptotic behavior of the replicator 
algorithm (21) for block-fading channels is intimately linked to 
the ergodic game ®. In itself, this is quite natural because the 
ergodic equilibrium of © represents the only reasonable time- 
invariant equilibrial notion for the block-fading game (6(f) with 
temporally uncorrected channels (see also [36]). More to the 
point, we have (see App. B for the proof): 



Theorem 5. The mean dynamics (24) converge to the (unique) 
Nash equilibrium of the ergodic game ®, and this conver- 
gence is exponential: interior orbits s-equilibrate in time 
O(log(l/e)). 

Remark. If the ergodic equilibrium is strict, the convergence 
exponent (18) of the mean dynamics (24) is just (App. A): 



c = min^ (ci) with = ' (1 - e ''') Av, 



(25) 



Theorem 4. If the learning parameters 6{n) of (21) satisfy 
5(«) = oo and 5^(n) < then the algorithm (21) 
for temporally uncorrelated block-fading channels converges 
(a.s) to the (unique) Nash equilibrium of the ergodic game ®. 



Remark. The most usual choice for the parameters 5{n) is 
S{ri) - I In. These variable rates can be interpreted either as 
the actual time step of the algorithm, or as a discount that 
users apply to their updating scheme at every tick of a timer 
This last interpretation is crucial for practical purposes because 
there are hard limits to how fast a device can update its policy. 
For constant 6(n) = 6, the dynamics (21) evolve faster, but 
convergence to equilibrium is in the distribution sense of [21]. 

2) Fast-fading channels: As we have already noted in 
Section II-B, the users' (ergodic) rates (6) in the fast-fading 
regime depend on the channels' statistics, so instantaneous 
channel information obtained when updating their powers is 
of little use. Because of this, the system becomes effectively 
deterministic, so, similarly to the static case, users may base 
their learning on the (mean) replicator learning dynamics (24) 
- i.e. the discrete-time learning scheme (21) without the noise 



where jk - £>kl(? II P(0))/Pa: is as in (19), but now hv^ - 
min \vk,ai,(q) — vtpiq) : P + au] is the minimum deviation cost 
of user k for the mean marginal utilities v^a- 

V. Numerical Simulations 

In this section, our aim is to validate our theoretical results 
by means of numerical simulations. We begin by introducing 
the sum-rate efficiency (SRE) of a power profile p: 



SRE(p) = 



(resp. SRE(p) = for (5) 

C sum 



(26) 

i.e. the ratio of the sum of achievable rates in the power profile 
p over the maximum achievable aggregate sum-rate under SIC 
(which is the sum-capacity of the multiple access channel 
(mac)); interestingly, if q is the game's (unique) equilibrium, 
then Csum - -^iq) (and similarly for the ergodic case). 

In Fig. 2(a), we plot the SRE at Nash equilibrium for 
randomly drawn static channels. While the equilibrium SRE 
can deviate significantly from its maximum value (unity) for 
A < K, in the A > K regime, the SRE is typically close to 
100% (and, in fact, equal to 100% with positive probability. 
We may attribute this to the fact that for A > K there is a 
finite probability that the system's equilibrium is at a vertex 
of A where each user is alone on a single channel, inducing 
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Sum Rate Efficiency Dislribution; K =3; A=2 (blue) and A=1 (red| 



Sum Rale EKiciency for ergodic 
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(a) Equilibrium sum-rate efficiency for static channels. 



(b) Equilibrium sum-rate efficiency for ergodic channels. 



Fig. 2. The CDF of the equilibrium sum-rate efficiency for static channels (Fig. 2(a)) and the equilibrium SRE for ergodic Gaussian i.i.d. channels as a 
function of the thermal SNR parameter p = Pmaxl<^^ (Fig. 2(b)) for different numbers of channels A and users K (all with similar maximum power constraints). 



Sum-rate efficiency and equilibration over time (A = 20 cliannels) 
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(a) Sum-rate efficiency and equilibration over time for static channels. 



(b) Long-term equilibration in block-fading channels. 



Fig. 3. Equilibration (and its efficiency) for different numbers of users and channels. In static channels, the replicator dynamics equilibrate extremely fast, 
even for a large number of users; in temporally uncon'elated block-fading channels, the dynamics still converge, but slower - due to the discounting S(n) = \/n 
in (21). 



Equilibration under Rayleigh fading (v = 5 km/h, v = 2 GHz, Sampling Step = 3 ms) 



Equilibration under Rayleigh fading (v - 15 km/h, v - 2 GHz, Sampling Slep - 3 ms) 
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Fig. 4. The Nash power level (dashed blue) and the actual power level learned by a user (solid red) in a 2 X 2 Jakes-fading game. As can be seen by the 
cross-correlograms of the two time-series (inlays), users are away from equilibrium for only a very short amount of time (light blue shading): we observe 
a 9 ms tracking delay for user velocities in the 5 km/h range and 6 ms for 15 km/h, meaning that the replicator dynamics converge within 10-15% of the 
system's coherence time (108 ms and 36 ms respectively). 
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optimal performance. Thus, with fair probability (close to 1/2 
based on our simulations) the complex SIC scheme yields no 
performance benefit over the much simpler SUD approach. 
Analogously, in Fig. 2(b), we plot the SRE at equilibrium 
for ergodic channels by using Proposition 1 to evaluate the 
maximum sum-rate under SIC; in that case, while the SRE is 
nearly optimal for small SNR, it deviates strongly from its 
maximum value for larger SNR values. 

Furthermore, to test the equilibration rate of (21) for dif- 
ferent values of K and A, we introduce the equilibration level 
(EQL): 

EQL(p) = 0(pyO(q), (resp. EQi:(p) = 0(p)/0(^) for ®), 

(27) 

with q being the equilibrium of (5 (resp. ©) - so an EQL of 1 
implies that the system has reached its equilibrium. 

Beginning with the static case, in Fig. 3(a) we drew = 50 
channel realizations and ran the discrete-time learning scheme 
(21) with constant g(n) and 6(n) for A = 20 channels and 
K = 5, 10, 20 and 30 users. Then, by plotting the average 
SRE and EQL over time, we see that even for 30 users, the 
system equilibrates within a few tens of iterations. On the 
other hand, for the ergodic block-fading scenario of Fig. 3(b), 
we used A = 10 channels and plotted the system's EQL over 
time for K - 2,4 and 8 users learning with S(n) = 1/n. 
As predicted by Theorem 4, the system converges to the 
game's ergodic equilibrium - but slower due to the discount- 
ing 6(n) = 1/n. Finally, for fast-fading users following the 
discrete-time version of (24), the EQL and SRE plots were 
virtually identical to the static case (to be expected since 
both dynamical systems are deterministic), so they have been 
omitted for space considerations. 

Finally, to test the convergence rate of the replicator dy- 
namics in more realistic (non-ergodic) fading conditions that 
do not possess a long-term stationary equilibrium, we also 
simulated in Fig. 4 channels that follow the well-known Jakes 
model for Rayleigh fading [37]. Specifically, we considered a 
2x2 game with user velocities v = 5 km/h and 15 km/h 
(Figs. 4(a) and 4(b) respectively) transmitting at a carrier 
frequency of v = 2 GHz. We then ran the leaming scheme (21) 
with a constant update period oi S = 3 ms, and we plotted the 
(normalized) power level of a single user against the evolving 
equilibrium power level (calculated at each step based on 
the instantaneous channel coefficients). Then, to quantify how 
well the users follow the system's evolving equilibrium, we 
calculated the cross-correlation of the two processes and the 
users' tracking delay (defined as the point of maximum cross- 
correlation). 

Remarkably, Fig. 4 shows that the dynamics track the 
game's evolving equilibrium extremely closely. On average, 
users equilibrate within 9 ms for 5 km/h fading velocities, 
and within 6 ms for 15 km/h - meaning that users converge 
within 10-15% of the system's coherence time (108 ms and 
36 ms respectively). 

Remark. For larger numbers of users (or channels), the results 
observed are similar; for instance, if the users in the previous 
game are increased to a few tens (we went up to A'^ = 50), their 
tracking delay becomes longer but never exceeds 30-35% of 



the channel coherence time. However, due to space limitations, 
we opted to present here only the 2 x 2 case for simplicity. 

VI. Conclusions and Future Directions 

In this paper, we studied the distributed power allocation 
problem for orthogonal uplink channels by introducing a game 
which admits a convex potential function. For both static and 
fading channels, we found that the associated game admits 
a unique Nash equiUbrium and we showed that a simple 
distributed leaming scheme based on the replicator dynamics 
converges to equilibrium from (almost) any initial condition. 
In fact, by proving a general result for convex potential games, 
we showed that the speed of this convergence is exponential: 
users converge to an e-neighborhood of an equilibrium in time 
which is at most of order 0(log(l/£)). 

There is a number of important extensions of this work 
which demonstrate the strength of the repUcator dynamics in 
continuous nonlinear games of this sort. First off, instead of 
the achievable rates Uk, one could consider energy-efficient 
metrics where users do not saturate their power constraints - 
e.g., when the price of transmission power might restrain users 
from transmitting at maximum power. More importantly, these 
techniques can be extended even to non-orthogonal channel 
models such as the multiple-input multiple-output (mimo) mac 
case where the game's strategy space consists of all positive- 
definite precoding matrices with constrained trace. Because 
of this nonlinear structure, the form (15) of the replicator 
dynamics no longer appUes, but one can still write down 
a suitably modified matrix-valued replicator equation which 
allows users to converge to equiUbrium. 

Appendix A 

Convergence Speed of the Repucator DvNAmcs 

Recall first that the solid tangent cone to A at ^ is the set of 
rays starting at q and intersecting A in at least one other point, 
i.e.: r^A = {z & "Sfi : Zka > for all a&Ak with q^a = 0}. 
With this in mind, we have the following generalization of 
convexity: 

Definition 2. A function F : A ^ R will be called star-convex 
w.r.t. G A if f{0) = F(q dz) is convex and increasing for 
all z e ^^ and for all 6* > s.t. q + Ozeh. 

Star-convex functions need not be convex, but strictly con- 
vex functions are star-convex w.rt. their global minimum and 
weakly convex functions with a unique minimum are also star- 
convex - in particular, both C), O are star-convex. For games 
with star-convex potentials, we then have: 

Theorem 6. Let Q = D (%, {A^.), {^j.)) be a game with a star- 
convex potential F. Then, the replicator dynamics (15) for the 
marginal utilities (pka = converge to q for any initial con- 
dition that starts at finite K-L divergence ho = £>kl(^ llp(O)) 
from q. Moreover, there exists c > such that: 

DKLiq\\pit))<hoe-''' forallt>0. (28) 

Our proof strategy will be to establish an inequality of the 
form jjHgipit)) < -cHq{p{t)) and then employ Gronwall's 
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lemma. To that end, we first define the "evolutionary index": 

Lqip) = - I,k,a iPka - ^ka) <PkaiP), (29) 

SO named because q is evolutionarily stable iff Lq{p) > near 
q. In fact, if we set Hg(p) = Di<^i^(q\\ p), an easy calculation 
shows that Lq is just (the negative of) the time derivative of Hq 
w.r.t. the repUcator dynamics (15): f^Hqipit)) = Y,k,a j^Pka = 

-Lq(p(t)). 

We will therefore begin by showing that Lq(p) > for all 
p e A\{^}, implying that is Lyapunov for the repUcator 

dynamics (15), and proving the convergence part of Theorem 
6. Indeed, if we fix some z e T^A, (29) may be rewritten as 

f'(0) = Zk,a £-\ Zka^ 6-'Lg(q+0z), for all > such that 
q + dz^ A. Then, with fiO) convex and increasing (Definition 
2), we obtain the estimate L^ip) = 0f'(e) > f(e) - /(O) = 
F(p) - F(q), which shows that Lq(p) > for all p ^ q. 

To prove the convergence time estimate (28) we wiU need to 
show that Lq grows linearly along directions which are not sup- 
ported in q, and quadratically along those which are supported 
in q. To be specific, let Vq = {x e : Xka = if qta - 0) 
be the subspace of directions of R^, Q - Yik-^k, which are 
supported in q, and let be its orthocomplement in Mfi. 
Then, by decomposing z e as z = z// +zj_ with zn £ Vq and 
Zj. e V^, we define the seminorms || • ||// and | • |j. as: 

M\] = WzilWl = ^l^zl, IZL = Ikxill = Xta \Zka\, (30) 

where the notation ^, Yjfa shorthand for summing over 
the directions of Vq and respectively. We thus get: 

Lemma 1. Let F: A ^Rbe star-convex w.r.t. q € A. Then: 

Lqip) > F{p) - F{q) >m\p-q\^ + \r \\p - ^H^, (31) 

where m - rmnk{<pka(q) ~ <Pkfi(q) '■ qkfi - 0, qka > 0), and r is 
the minimum of the Rayleigh quotient (z,M(^ + z)z)/||z|P for 
the Hessian M(p) = of F, restricted over T^A. 

Proof: Since q minimizes F, the KKT conditions give 
4>ka{q) = - = -^k for all a e Ak such that qka > and 

(pkail) < ~^k Otherwise (where Ak denotes the complementary 
slackness Lagrange multiplier of F over A). Thus, a first order 
Taylor estimate with Lagrange remainder readily yields: 

fio) = /(o) + f'm + yi^e" (32) 

for some ^ G (0,6*), so (31) will follow once we properly 
estimate the linear and quadratic terms of (32). 

As far as the linear term of (32) is concerned, we will 
have /'(O) = 2,,„z.„ = z.„ + = 

"'^'tj ^ m\z\±, where the last equaUty holds 

because 2i Zka = - Zka (recall that z € T^A) and the last in- 
equality is just the definition of m. Similarly, for any ^ e (0, 0) 
and z G T'qA, we get /"(^) = {z,M(q + ^z)z) = Rq^zi^z) llziP, 
where Rp{w) = (w,M(p)w), p € A, w € TpA, denotes the 
Rayleigh quotient of the Hessian M of F. Hence, if r is the 
minimum of Rq+„iw) over the set = [w e T^A : ^ H- w G A}, 
we wiU also have /"(^) ^ ''Ikll^. and (31) follows by plugging 
the above into (32) and noting that ||z|| > ||z||//. ■ 



Obtaining similar estimates for the relative entropy function 
Hq is harder (after all, Hq blows up near the boundary of A), 
so we will need two more auxiliary lemmas: 

Lemma 2. For all z g T^A \{0) and for all a> 1, the equation 

Hgiq + 0z) = a \zLe + ifl zyqkp 9\ (33) 

admits a unique positive root 6a = Oa(z). Consequently: 

Hqiq + 0z)<a |zL0 + ifl Ef,;j zyqkp 0^ for all 6 < 0a{z). (34) 

Proof Let h{9) = H^iq + 0z) be the LHS of (33), and 
denote its RHS by agi6). Then, if we set w(9) = hi0) - ag{0), 
we readily obtain vi'(O) = 0, w'{Q) - |zL(l - a) < 0, and 
w"{0) = TikfiZlp/qk/si^ - a) < 0, and the result follows by 
simple arguments relying on the mean value theorem. ■ 

Lemma 3. Let F: A —^R be star-convex w.r.t. q & A and let 

p(t) be a solution orbit of the replicator dynamics with initial 
relative entropy ho — Hq(p(0)). Then, there exists b > \ s.t.: 

Hq(p(t)) < b \p(t) - q\^ + 4||p(0 - q\\], (35) 

where qo = min^_„{^t„ : qka > 0). 

Proof: Fix some a > I. Then, by Lemma 2, we know 
that (33) admits a unique positive root 9a(z), so let haiz) = 
Hqiq + 0a(z)z) and set ha - max{/!a(z) : z e S q), where 
5^ = {z G r^A : z + ^ G A but ^ H- (1 -I- e)z ^ A for any s > 0}. 
Moreover, set h^ = max[h(),ha}, let 6c(z) be the unique 
positive root of the equation Hy(q + 0c(z)z) = he, and define 

Kz) = g(e,iz))/h, with g(0) = |zLe+ IS^V^/^*^^^ (as 

in the proof of Lemma 2). We will then have b{z) > a 
since, otherwise, (34) would yield the contradiction he = 
b(z)g(0c(z)) < ag(0e(z)) < h(0e(z)) = he. 

With b(z) > 1, a second application of Lemma 2 yields 
Hqiq + 0z) < ^(z)(|zL^^+ I i:i^zlplqkp0'') for all < 0,(z). 
Thus, if we decompose p{t) as p{t) = q + 0{t)z{t) with > Q 
and z{t) G Sq, we will have 0{t) < Bc{z{t)); indeed, should 
this ever fail, we would have Hg(p(t)) > b{z{t))g{0{t)) > 
biz{t))g{0e{t)) = he > ho which contradicts the fact that Hq 
is Lyapunov. Hence, with 6it) < 0e{z{t)) for all ? > 0, we get 
Hq{p{t)) < b{z(t)){\z(t)\^0(t) + \ i:[pzlp{t)lqkp0'{t)\ and (35) 
follows by taking b = max{^(z) : z g Sq). ■ 
Proof of Theorem 6: With notation as in Lemmas 1 and 
3, let c = min{m/b,rqi)/b}. We then get Lq(p(t)) > m\p(t) - 
q\± + ^r\\p(t) - qWj^ > cHq(p(t)) and Gronwall's lemma yields 
Hqipit)) < hoe~'^'. Since the kkt inequalities for F are strict 
along any direction of R^ which is not supported in q, we will 
have m > and, consequently, c > as well. ■ 
Proof of Theorems 3 and 5: The potentials O and 3> are 
star-convex, so both theorems follow from Theorem 6. ■ 

All that remains is to calculate the value of c when q is 
strict. In that case, given that the intersection of Vq with T^A 
is trivial, the quadratic term of (31) can be ignored and we get 
Lq{p) > \ Y^kWPk - qk\\\l^<^k, where A^k = "f^vi^n^aMKaM) - 
<l>kiji<l)} > 0. As for (35), we may decompose pk g A^ \{qk} as 
Pk ^ qk + 0kZk where Zk e r^^At has Zk,at = -Pk- Thus, with 
Pk,at = Pki^- Ok), we readily obtain H^ip) = -Y^kPk log(l - 
6k). Now, let 6^ be defined by the equation = Hq^{qk + 6kZk)-, 
i.e., 6/^ = 1- expi-ho/Pk), implying that -Pklogil - 6k) < 
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hoOk/^l iff < 0jt < 0^ (because of convexity). We then claim 
that H,(p(t)) = -JlkPklogil-Okit)) < hQZkOk(t)/ei, where 
0k(t) is defined via the decomposition pkit) - qk + Ok(t)Zk(t)- 

However, if 9k(t) > 6'^ for some t > 0, then we would 
have Hq^ipkit)) > ho, and, hence, Hq(p(t)) > Hg(p(0)) as 
well, a contradiction - recall that Hq(p(t)) is decreasing. Thus, 
combining all the above, we only need pick c such that 
Pk^<Pk ^ cho/O^, and the sharpest such choice is: 

c = mink{Pk/ho{l-e-'^'''')A<Pk}. (36) 

Appendix B 

Stochastic Approximation of the Replicator Dynamics 

Proof of Theorem 4: Note first that A is invariant under 
the dynamics (22) if the 6{n) are chosen small enough. To 
see this, we will restrict ourselves w.l.o.g. to a game with one 
user and two choices, A and B (the general argument being 
similar). Thus, if we let pA{n) = P\,A(n) be the power that the 
user sends to channel A at the n-th iteration of the dynamics, 
we must find 6(n) such that < p(n) < 1 for all n > and all 
possible gA^n) ^ 0. So, assuming this holds for some n > 0, 
we get: 

PAin + 1) - pA(n) = 6(n)pAin)il - pAin)) 



o"a + 8A(n)pA(n) cr\ + gB(n)(l - pAin)) 




The first term of the LHS of (37) is positive and the second is 
uniformly bounded, say by M, so 6{n) < M yields p^(n + 1) > 
0. The complementary inequahty Pa(« + 1) < 1 then follows 
similarly, so, with p{n) e A for all n, our theorem follows from 
Theorem 2 and Corollary 4 in Chap. 2 of [21]. ■ 
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